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Abstract. This paper is devoted to pricing American options using Monte Carlo and the 
Malliavin calculus. Unlike the majority of articles related to this topic, in this work we will not use 
localization fonctions to reduce the variance. Our method is based on expressing the conditional 
expectation E[f(St)/S s ] using the Malliavin calculus without localization. Then the variance of the 
estimator of E[f(St)/S B ] is reduced using closed formulas, techniques based on a conditioning and 
a judicious choice of the number of simulated paths. Finally, we perform the stopping times version 
of the dynamic programming algorithm to decrease the bias. On the one hand, we will develop the 
Malliavin calculus tools for exponential multi-dimensional diffusions that have deterministic and no 
constant coefficients. On the other hand, we will detail various nonparametric technics to reduce 
the variance. Moreover, we will test the numerical efficiency of our method on a heterogeneous 
CPU/GPU multi-core machine. 
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Introduction and objectives. To manage CPU (Central Processing Unit) 
power dissipation, the processor makers have oriented their architectures to multi- 
cores. This switch in technology led us to study the pricing algorithms based on 
Monte Carlo (MC) for multi-core architectures using CPUs and GPUs (Graphics 
Processing Units) in [1] and (2|. In the latter articles we basically studied the im- 
pact of using GPUs instead of CPUs for pricing European options using MC and 
American options using the Longstaff and Schwartz (LS) algorithm [3J. The results 
of this study proves that we can greatly decrease the execution time and the energy 
consumed during the simulation. 

In this paper, we explore another method to price American Options (AO) and 
which is based on MC using the Malliavin calculus (MCM). Unlike the LS method 
that uses a regression phase which is difficult to parallelize according to [5] , the MCM 
is a square^ Monte Carlo method which is more adapted to multi-cores than the LS 
method. Moreover, using MCM without localization does not depend on parametric 
regression, we can increase the dimensionality of the problem without any constraints 
except for adding more trajectories if we aim at more accurate results. 

American contracts can be exercised at any trading date until maturity and their 
prices are given, at each time t, by [4| 

(0.1) Pt{x) = sup eeTt T Ek, x (e-^-')$(S e )) , 

where 7*,t is the set of stopping times in the time interval [t, T], E ttX is the expectation 
associated to the risk neutral probability knowing that St = x and r and 4>(St) are 
respectively the risk neutral interest rate and the payoff of the contract. 
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1 What we mean by square Monte Carlo is not necessarily simulating a square number of trajec- 
tories, but a Monte Carlo simulation that requires a Monte Carlo estimation, for each path, of an 
intermediate value (here the continuation) and this can be done by using the same set of trajectories 
as the first Monte Carlo simulation. 
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With Markovian models (which is the case in this article) , to evaluate numerically 
the price (|0.1[) . we first need to approach stopping times in 71, t with stopping times 
taking values in the finite set t = to < t\ < ... < t n = T. When we do this 
approximation, pricing American options can be reduced to the implementation of the 
dynamic programming algorithm [4]. Longstaff and Schwartz consider the stopping 
times formulation of the dynamic programming algorithm which allows them to reduce 
the bias by using the actual realized cash flow. We refer the reader to [5] for a formal 
presentation of the LS algorithm and details on the convergence. In (|0.2[) . we rewrite 
the dynamic programming principle in terms of the optimal stopping times Tfe, for 
each path, as follows 

(0 2) T " = T ' 

Vk e {n- 1,...,0}, r k = t k lA k + Tk+ilA%, 

where the set Ak — {&(St k ) > C(St k )} and C(St k ) is the continuation value whose 
expression is given by 

(0-3) C(S tk ) = E (e- r(t ^-^P tk+1 (St k+1 )\s tk ) . 

Thus, to evaluate the price (10.11) . we need to estimate C(St k ). Algorithms devoted 
to American pricing and based on Monte Carlo, differ essentially in the way they 
estimate and use the continuation value (|0.3[) . For example the authors of |6| perform 
a regression to estimate the continuation value, but unlike [3J, they use C(St k ) instead 
of the actual realized cash flow Pt k+1 (St k+1 ) to update the price in (j0.2[) . Other 
methods use the Malliavin Calculus [7_ or the quantization method [8, for C(St k ) 
estimation. In [2], we implement the LS method because it is gaining widespread use 
in the financial industry. As far as this work is concerned, we are going to implement 
MCM but unlike [7] we use the induction (I0.2[) for the implementation and we reduce 
the variance differently, without using localization. 

Formally speaking, if r = 0, we can rewrite the continuation using the Dirac 
measure £ x {-) at the point x 

, n ,x n( , _ E (P tk+1 (S tk+1 )e x (St k )) _ E (P tk+ AS tk+1 )ls tk >AS tk )7r tk , tk+1 ) 

(U.4) U[X) — — - ,„ y. — 7 r • 

E ^ S ^ E(l Stk > x {S tk )^ tk+1 ) 

The second equality is obtained using the Malliavin calculus and we will specify, in sec- 
tion [2] expression (|2.3|l . the value of n tk , tk+1 by an integration by part argument for the 
Multi-dimensional Exponential Diffusions with deterministic Coefficients (MEDC) 
model 

dS t = S t a(t)dW t , So = y, 

in the case of deterministic non-constant triangular matrix a(t) and when <jjj (t) = 
<Jij5{i — j) with a fixed constant <xy (The latter case will be used as a bench- 
mark). Instead of simulating directly the last term in (|0.4j) . in section |3j we project 
J-St >x(St k )^t k ,t k+1 using a conditioning as follows 



(0.5p(x) = 



E (Pt k+1 (St k+1 )E 


^St k >x(St k )^t k ,t k + 1 


{J tk+1 a^dWi}^^ 


) 


E (E 


lSt k >x(St k )nt k ,t k+1 


{fo k+1 a ij( u ) dW Z}l<3<i<d 


) 
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Then, in section [U we estimate (|0.5[) by Monte Carlo simulation and we use the 
approximation 



(0.6) C(x) 



N' 



i EL Hx, {j^ 1 a l3 (u)dWi}[^ d ) 



l<j<i<d> 

K x dyij}j<i) = E(is tk > x (s tk )n tk ^ k+1 \{j* k+1 (Jiji^dWDiKjKiKd = {yij}i<j<i<d) 

and N ^ N'. Thus, in sectionHJ we provide another method to accelerate the conver- 
gence based on a choice of the appropriate relation between N and N' that reduces 
the variance of the quotient (|0.6[) . Note that, even if one can reduce the variance 
by an "appropriate" control variable, we choose here not to implement this kind of 
method because it is not standard for American options. 

In the last section, on the one hand, we provide the numerical result comparison 
of LS and MCM. On the other hand, we study the results of using the two variance 
reduction methods (|0.5I) and (|0.6[) . Finally, we test the parallel capabilities of MCM 
on a desktop computer that has the following specifications: Intel Core i7 Processor 
920 with 9GB of tri-channel memory at frequency 1333MHz. It also contains one 
NVIDIA GeForce GTX 480. 

Let us begin with section Q] in which we establish the notations, the Malliavin 
calculus tools and the model used. 

1. Notations, hypothesis and key tools. Let T be the maturity of the Amer- 
ican contract, (fi, J 7 , P) a probability space on which we define an d-dimensional stan- 
dard Brownian motion W = (W 1 , W d ) and F = {J-" s } s <t the P-completion of the 
filtration generated by W until maturity. Moreover, we denote by {J 7 ^'"' } s <t the 
P-completion of the filtration generated by (W 1 , W d ) until the fixed time t £ [0, T]. 
The process St models the price of a vector of assets S}, Sf which constitute the 
solution of the following stochastic differential equation ( ' is the transpose operator) 

(1.1) t=( aiit ))>dW t , S*=z 4 , i = l,..,d, 

where a(t) = {(Jij(t)}i<ij<d is a deterministic triangular matrix ({cij(i)}i<j = 0). 
We suppose that the matrix a(t) is invertible, bounded and uniformly elliptic which 
insures the existence of the inverse matrix p(t) = cr _1 (i) and its boundedness. 

We choose the dynamic (jl.ip because it is largely used for equity models, HJM 
interest rate models and variance swap models. Moreover, the case of <Jij(t) = (Jij5(i — 
j) (<Jij is a constant) will be easily tested in the section [5j One should notice that in 
the case where the dynamic of S is given by 



dSj 
SI 



= (a l )'(t,S t )dW t , S l = Zi , i = l,..,d, 



we can use, for instance, the following Euler scheme to reduce this model to the model 

CEQ) 



n-l 



dlog{S l t ) = 2^ Ue[t k ,t k+1 [ 

k=0 



kT 

Sn = Zi, 1 — 1, .., d, ifc = . 

n 
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Note that this scheme does not discretize the process S but the process log(5). 

Throughout this article, we will use two operators: The Malliavin derivatives 
D and the Skorohod integral S and we define them in the same way as in [5J. For 
a fixed m G N, we define the subdivision {t k n }k<2 m of the finite interval [0,T] by: 
t k m = kT/2 m . Then we introduce S(K 2m ) the Schwartz space of infinitely differentiable 
and rapidly decreasing functions on R 2 . Let / G <S(R 2 ), we define the set 6 m of 
simple functionals by the following representation 



F G 6 m <&F = f (w t i m - W t o m , W t 2 m - W t i m , W t < 



W 4 



One can prove that 6 = UmeN ® m ^ s a linear and dense subspace in L 2 {yi) and that 
the Malliavin derivatives D l F of F G © defined by 

fe=0 

represents a process of L 2 (ft x [0, T]) with values in L 2 ([0, T]). We associate to & the 
norm || • 1 1 x 2 defined by 



\F\\l 2 =E\F\ 2 +j2E l T {D\Ffdt. 
i=i Jo 



Finally, the space D 1 ' 2 is the closure of © with respect to this norm and we say that 
F G B 1 ' 2 if there exists a sequence F m G © that converges to F in L 2 (f2) and that 
D u F m is a Cauchy sequence in L 2 (fl x [0,T]). 

Now we use the duality property between S and D to define the Skorohod integral 
S. We say that the process U G Dom(S) if VF G D 1 ' 2 



^ U t ■ D t Fdt \ < C(l/)||F|| li2 , 



where C(U) is a positive constant that depends on the process U. If U E Dom(8), 
we define the Skorohod integral S(U) — J U t SW t by 

(1.2) VF G B 1,2 , E (^ F J q U t ■ 5W^j = E (FS(U)) = E (j^ U t -D t Fdtj, 
(•) is the inner scalar product on R d . 

Below, we give some standard properties of the operators D and <5: 

1. If the process Ut is adapted, S(U) = J UtSWt coincides with the Ito integral 
/ U t dW t . 

2. The Chain Rule: Let F = (Fi, F 2 , F k ) G (B^ 2 ) fc and : R k -> R a 
continuously differentiable function with bounded partial derivatives. Then 
(f>(Fi, F 2 , Fk) G B 1 ' 2 and: 

D 1 ( F 1 , F 2 , . . . , F k ) = ^ ( » Fi , ■ ■ ■ , Fk ) D t F . 

i—1 
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3. The Integration by Parts: The IP formula will be extensively used in the next 
section on the time intervals I = (0, s) and / = (s,t) with s < t g]0,T]: Let 
F £ D 1,2 , an adapted process U £ Dom(S) and that FU £ Dom(5). For each 
1 < i < d we have the following equality 

(1.3) J FU u S l W u = F U u dW l u - J U u D l u Fdu. 

To simplify the notations, we denote Hi(S\) = H(S l s — Xi) for the heaviside function 
of the difference between the i th stock and the i th coordinate of the positive vector x. 

Throughout this article, we will suppose that g £ £/, is a measurable function 
with polynomial growth 

£ b (R d ) = {fe M{R d ) : 3C> and m £ N; f{y) < C(l + \y\ d ) m )} , 

where A4(R d ) is the set of measurable functions on M. d . The elements of the set £{,(K. d ) 
satisfy the finiteness of the expectations computed in this article. 

2. The expression of the continuation value . The first theorem of this sec- 
tion provides the expression of the continuation (|0.3p when using Malliavin calculus 
for MEDC models. This theorem can be considered as an extension of the log-normal 
multi-dimensional model detailed in [7J. In Theorem 12. 4[ we provide the expression of 
rj t , introduced in Theorem 12. 1[ without using Malliavin derivatives and this expres- 
sion can be computed using the relation (|2.15l) . The last theorem is a special case of 
the first one because we take o~ij(t) = o~ij5(i — j) (iJij is a constant) that will be used 
to test numerically our nonparametric variance reduction methods detailed in section 
[3] and section |U 

Theorem 2.1. For any s £}0,t[, g £ £ b and x > 



(2.1) E[g(S t 
with 



S s 



(2-2) T Stt [f}(x)=E[f(S t )T Stt H 



T s ,t\g\{x) 
T.,t[l]0«0' 

' H k {S k s ) 



k=l 



qk 



where r Sj t = t and Tl t can be computed by the following induction scheme F d t 
n d f, fork £{!,..., d-1}: It t = T^^f - J^ =k+1 J* D^f D^J du wtih ' 



d r i i 

"Vt = 1 + Y] / (Pjk(u)dW£, Vjk{u) = -Pjk(u)l ue ]o,s[ - Pjk(u)l ue ] s ,t[, 

. . .In S I — S 



j=k 

where p is the inverse matrix p(u) = cr _1 (u). 

From this theorem the value of TTt k ,t k+1 m (|0-4p is given by 

d 1 

(2-3) Kt k ,t k+1 = T tk ,t k+1 Y\_ ~qT- 

i=l '* 

To prove Theorem 12. 1[ we need the following two lemmas which are proved in the 
appendix. Lemma 12.21 expresses the independence of the sum X^i=fc Pik(u)D t u g(St) 
from the variable u. 
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Lemma 2.2. For any u e]0,t[ and f G C 1 (R d ) then 

d 

(2.4) = S^d x J{S t ), p(u) = a- 1 ^). 



The following lemma constitutes with equality (|2.4[) the two keys of the proof of 
Theorem O 

Lemma 2.3. For any I c]0,i[, ft, € C^°(R), a; 6 and F £ D 1 ' 2 , we have 

E (fi^B$r du ) = ^(his^FEUljPMdwi) 

- E(h{S k s )Y* l = k ) I Pik{u)D\ 1 Fdu), 
where p is the inverse matrix p(u) = cr _1 (u). 

Proof of Theorem l2.ll To prove Theorem l2.H it is sufficient to prove the following 
recursive relation on the parameter k for each hi £ C£°(R) and / £ C 1 (K d ) n £b(M. d ) 

(2.6) E (f(S t )f[h>(Sl)) =E(f(S t )T k s yf[h>(Sl) f[ Mp) ) . 

V i=i / V t=i »=fe+i s / 

Indeed, if it is the case then by density of © in L 2 (0), one can approximate f(St) £ 
L 2 (fi) by F m £ & and pass to the limit on the left and on the right term of (|2.6[) 
using Cauchy-Schwarz inequality and the dominated convergence theorem. Let us now 
consider the singularity due to the heaviside, let <f> £ C£°(R) be a mollifier function 
with support equal to [—1, 1] and such that J R (f>(y)dy = 1, then for any y £ K we 
define 

h mk {y) = {H k * 4> m ){y) £ C b °°(M), 4> m {y) = rrr^m^y). 
If the equality (|2.6p is correct for any k, then 

(2.7) e i f(s t ) n \=e[ f(s t )v s , t n hnMS '~ 



fc=l / \ k=l 



On the one hand, h m k(y) converges to H k (y) except at y — and the absolute 
continuity of the law of S k ensures that h mk (S k ) converges almost surely to H k (S k ). 
Using the dominated convergence theorem, we prove the convergence of h mk (S k ) to 
H k (S k ) in L p (Vt) for p > 1. By Cauchy-Schwarz inequality, we prove the convergence 

h m k(S k )\ ( ,, OVp A H k {S k ) \ 



fc=i s / V fc=i s 



On the other hand, h' mk (y k ) = J R H k (z k )cj)' m (y k - z k )dz k = (j) m (y k - x k ). More- 
over, we observe that, according to our assumption, the distribution of the vector 
(Si, Sf , S}, Sf ) admits a density with respect to the Lebesgue mesure on R d xR d 
we denote it by p(jj,z) with ?/ = (y±, j/^) and z = (zi, 2d), thus 



E(f(S t )f[h' mk (S k )) = [ f(z)( [ n 



4>m{yk - x k )p(y, z)dyi...dy d dz 1 ...dz d 
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Because / Rd rifc=i 4>m{yk - x k )p{y, z)dyi...dy d converges to p(x, z), we have 
E^f(S t )j[h' mk (S^ ^ E (f(S t )j[s Xk (S^ , 

which concludes the first part of this proof. 

To prove the induction (|2.6[) . we introduce the following notations: 

H( x ) = II ■ ft'kO 6 ) = IJ^(^i), ie = (xi,...,^). 



The case fc = c? is given by 



(f(S t )h' d {S a )) = E (l J Q S f(S t )h'^i{S B ) D %£f ] du) 
= E (Uof(S t )h' d -i(S s )^g^du 



where we replaced h' d (Sf) by u ^dgd 3 hi the nrs t equality and D a S^ by its value 
Vdd(v>)Sg in the second equality. Using Lemma 12.31 with 



and the fact that h'd-i(S s ) does not depend on the d th coordinate of the Brownian 
motion yields 



dSs)f(St) du 



E(Uom)h' d ^(Ss)^$P) 

(2-8) = E (Fh4Sf)Uo ^) - E (h d (Sf)Uo Dt'^ d ^TJ 

Besides using Lemma l2~2l for the Malliavin derivative of f(St), we get for v G]s,t[ 



f(St) 



CTdd{u) 

Thus, the value of the last term of ()2.8[) is given by 

E (h>d- l {S s )h d {Sf)\$°Dt^ 1 ^)=-E (/TViOS.MS?)^) 

And by duality (|1.2p we remove the Malliavin derivative of f(St) in the last term of 
the previous equality 



E 



( h' d -i(S s )h d (Sj) i ft Dtf(S t )dv \ _ p / , £d = iWMgl) F J 1 f* £jf{St)dv T \\ 

'ftWSjWS?) p / f /Q \ 1 ft dW* \\ 

ga ^{/(*t)— J s -Fsj) 



8 



Regrouping all terms together 

e (f(s t )h' d (s s )) - e (f(s t )Ti t h' d ^(s s )h d d (s s )) , r* it = <f . 

Let us suppose that (|2.6[) is satisfied for k and prove it for k — 1, thus 
E (f(St)h' d -i(S a j) = E(f(S t )r k s fhi +1 (S s )h'k(S s )\ 

p/i r« /(■St)r;+ 1 h^ +1 (5,)ft> fc _ 1 (s a ) o^) , 

I s JO SJ fffcfc(tt) 

where we replaced h' k {S k ) by ^gj" J in the second equality. Using Lemma |2"U1 with 

s k 

and the fact that h'k-i{S s ) does not depend on the j th coordinate (j > k) of the 
Brownian motion yields 

(2-9) 

- E?= fc E [h k {S k )h' k ^{S s )\ £ Dl 



f(s t )h d k+1 (s s )r k s + 1 



p jk (u)du 



Besides, if for x = (xi, ...,Xd) we denote H(x) — hk+ J~^ , the Malliavin derivative of 
the last term of (|2.9p provides 

Di[T k s + 1 U(S s )f(S t )]=DiT k + 1 U(S s )f(S t ) + T k f Djn(5 s )/(5 t ) 

Using Lemma |2~21 for the Malliavin derivative in the two last terms, we get 

d 

(2.io) Y, pMu)Diu(s s ) = s k d Xk n(s s ) = -n(s s ), 

d 

(2.H) 5>,-*(«)^/0St) = s k d x j(s t ). 

From (|2.10p . we deduce that 

h' k - 1 (S s )hi(S s )f(S t )T k + 1 



h' k - 1 (S s )h k 



S JO - u 



Thus, introducing the random variable ir k 'f = 1+X^ ._ k Jg pj k {u)dWl and using 
EiUo^^du) = E (Fh k (S k )^) 

(2.12) - ^ f^V^^ Jo ^M^ 1 *.) 

- £ f ^'^-p-nt 1 -1- ;« Pjk (u)Dif(S t )du) , 



where we used the fact (|2.11l) that Y^,j= k Pjk{ u )^i,f{^'t) does not depend on u. Let 
us develop the last term of (|2.12[) 



E 



h d k (s s )9 k ^ 1 (s s )r k + t 1 



in 



= E 
= E 



+ ft EU Pjk(u)Dtf(S t )d 

'' ,h ' k - liSs) EU E \th ft ^f Pjk {u)Dif{S t )d 



si 



.E\ 



_ ^ e ^M^liiA ^jl Pjk {u)DiT<lfdu) 

We applied (|1.2[) in the third equality to remove the Malliavin derivative of f{St). We 
also used (|1.3[) in the last equality. To complete the proof, we should remark that 

- f DiT k s yp ]k {u)du--^ f ' DiT k s fp jk (v)dv = - f DlTtfD^dy 

s JQ IS J s Jq 

and because is an J r t fe+1 ' "'^-measurable random variable 

j^f DlX^D^du = T^^- J2 fDiT k s fDi<n 
,-,„ Jo , . , n Jo 



-pk pfc+1 k,d 



du. 



j=k ' 



j=k+l • 



□ 

Theorem 12.41 provides the expression of T k t in (|2.16p without using the Malliavin 
derivatives {D 3 u }j >k and which can be efficiently computed using (|2.15l) . We will use 
in Theorem 12 .41 the set of the second order permutations S k d defined as the following 



(2.13) 



Sk,d = {?6 Sk,d, pop = Id}, 



where S k ,d is the set of permutations on {k, ...,d} and Id is the identity application. 
By induction, one can easily prove that 

(2.14) s ktd = {4 o P , P e Sfc +M } u {t{ o p , p e s k+hd , P (i) = l,le{k + 1, d}}, 

with r\ : t 4 j as the transition application on {k,...,d}. We also denote by A 
the determinant that involves only the permutations of S ki d, that is to say, the A 
associated to the matrix C = {Ci.j} k <i.j<d is given by 

d 

A = E <P) 11^(0 
Using f|2 . 14[) , we can easily prove that 

d 

(2.15) A = C fc:fc A fe , fe + Yj e(4)C i>k C k>i A kJ 

i=k+l 

where Afc^ is the A associated to the C l,fc obtained from C by suppressing the line and 
the column i as well as the line and the column k. Based on the development according 
to the first line, relation (|2.15p provides a recursive formula even more efficient than 
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the determinant formula. Of course, we can generalize the relation (12.15P to the one 
that involves the development according to a j th line or a j th column with k < j < d. 

Theorem 2.4. Based on the assumptions and the results of Theorem \2.1l for 
k E {!,..., d} the value ofT^ t is given by 



( 2 -!6) 1"^= ^ e (p) A k >P {k) A k+l,p{k+X)- A d,p{d) = Y e (P) II^PW' 

P65 fcid p£S k>d l=k 

with e(p) as the signature of the permutation p £ Sk,d, Sk,d defined in i2.13\) and 



( ^I't Cj.,2 Ci, 3 
V 2,d n 

1 7T s ,i ^2,3 



.4 



C2,d 



i d— l,d s~i 

1 7T C t Cd-l,d 



V i i 



>s,t 
1 



d.d 



J 



where Ck.i is the covariance ofn s ' t and tt 1 ^. 

Proof. We prove (|2.16p by a decreasing induction. For k = d, the expression 
(|2.16|) is clearly satisfied. We suppose that (|2 . 1 6|) is satisfied for k + 1 and we prove 
it for k. According to Theorem O r* t = rJ+V^ - ^U+i Jo D L T sV D i^fdu, 
but 

fc+i 



^u 1 s,t 



— J2l=k+1 J2p£S k + 1 , d ,p(l)=l £ ^ Yli=k+l,i^l A i,p{i)D 3 u A l,l> 

the second equality is due to the fact that A i,p(i) is a constant except for p(l) = I. 
Subsequently 

-T,U+ifo D i T U lD i^:i du 

= - Ef= fc +i T,pes k+ i, d , P [i)=i 6 (p) n<=fc+x,t#/ Ap« £j=fe+i Jo D L A i J D i n . 

~ J2l=k+1 J2p£S k+1 , d ,p(l)=l e (P) rii=fe+l,i^/ A i,p(i)Ck,l- 

Finally 



k.d 



A-U , 1 JO 



j = k+l 

d d d 

= n s't y e(p) n ^.p( i ) ~ $z ^ e ( p ) n Ai >pd) 

peS k+1 , d i=k+l l=k+l p£S k+1}d ,p(l)=l i=k+l,ijtl 

d 

= Y e (P)]J A i,p(i)- 

p£S k , d l=k 

The last equality is due to the development of X)pes fc d e (p) T\i=k A i,p(i) according to 
the k th line of A which can be justified by (|2.14[) . □ 
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As a corollary of Theorem 12. II and Theorem 12. 4[ we obtain the following result. 
Corollary 2.5. For any s e]0,t[, g € £ & and x > 0, i/ <7y(t) = cr.y<5(z — j) </ien 



5 S = a; 



with 
(2.17) 

and 



T s , t [f](x)=E[f(S t )f[ 



T s ,t\g\{x) 



H k (S k s )W* 



<7 kS (t-s)S* J ' 

IU s fe t = (t- s )(VK s fe + «7 fcS ) - s(W t fc - W*), = 1, 



3. Variance reduction method based on conditioning . In this section, we 

show that one can reduce the variance by a projection on L 2 ^| J* aij(u)dW^ ^ 

and by using a closed formula of T s , t [l](x). Like in section [5J we give in Theorem 13. II 
the results of the special case o~ij(t) = o~ij5(i — j) {uij is a constant) that will be used 
to test our variance reduction method. 

We begin with T Sjt [l](a;), we can compute the explicit value of this function of x. 
The T s j[l](x) closed formula can be got, for instance, from a change of probability. 
Indeed, we define the probability P = N coeff ([\ k=1 S$/S*)P which yields 



r.,t[l](aO = ~jrj~ e 

Ncoef f 



\{H k {S k s) 



k=l 



N coe f f is a deterministic normalization coefficient such that M s = N coe f f (J], j Sq /S^) 
is an exponential martingale with E(M S ) = 1. Under P, r s t has the same law as a 
polynomial of Gaussian variables which is sufficient to conduct the computations. 

Let us now denote 

Kx, {ya}j<i) = e (r., t J] { /' ^(ujdw^l = {wj}!^^ j 

\ fe=l s ^0 J l<j<i<d / 

In what follows, we are going to prove that the function h(x, {yij}i<j<i<d) can be 
explicitly known if, for each j, the (d — k) x (d — k) matrix Sj t = {E||} ^ fe<d = 

■j J* CTij (i^ct/cj (u)dw > is invertible. First, please remark that according to our 



notations i — j + 1 and k — j + 1 are the indices of the element Y, l h in the matrix 



j<i,k<d 

' " ' 1 ; - ; 1 ' 1 1 <- m me matrix 

(we will use similar convention also for A 7 ' , , 4^ and $jt). Also we notice that the 
condition of invertibility of £ Jt is not an important constraint, because one can choose 
a time discretization {t m } such that the matrices {Y^j tm } k<d fulfill this condition^]. 



Nevertheless, this is a difficult task when the dimension is sufficiently big. 
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The computation of h{x, {yij}i<j<i<d) is based on a regression of Gaussian vari- 
ables according to the Gaussian variables Yy — J Q crij(u)dW^. First, we perform a 
linear regression of J * ipjk(u)dW 3 t according to Yy 

(3.1) f Vjk {u)dwi = J2 a ik Y a + x ik , 

with {Xjk} 1<k< j <d as a Gaussian vector Af(0, Cx) which is orthogonal to Y. Using 
Ito isometrylwice and the orthogonality of Y and X, we obtain 



sQf cp jk (u)dWlYi^j = (p jk (u)(Ti j {u)du = Y^Y, 



3=k 



If we denote A 3 = 



A3 = { a i : k}j<i,k<d and = | /J cpj k (u)aij(u)duj , we get 



A 3 = Ej t l * it . 



In the same way, we perform a linear regression of a k j(u)dWi according to 
(3-2) f <r fcj (u)dW2 = Kk Y » + Z kj > 

with {Zkj} 1< j <k<d as a Gaussian vector j\f(0,Cz) which is orthogonal to Y. Using 
Ito isometry twice and the orthogonality of Y and Z, we obtain 

E (J' <J kj (u)dWlY t ^ = J' a M {u)^)du = ]T E&&£ fc . 

* — J 

If we denote B 3 = {b J ik }j<i,k<d, we get 

B 3 = T,j t Y,j s . 

Now using (|3.ip . (|3.2p and the value of A and B, the covariance matrices Cx, Cz and 
Cxz = E(XZ) are given by = f Q tpji(u)(pjk(u)du) 

[c x ]i tk = S(^^ Jfc ) = - (4)'*} t - (A 3 )'^ t + (Ai)% t A 3 , 
[C z ]{ k = EiZtjZu) = ^ - (BD'^ S - (B 3 )'Y* S + {B{)% t Bl 

[Cxzi, h = E{X 3l Z k] ) = - (4)'£} s - {B{)'*% + (Ai)% t B{. 

Using (|3.ip and (|3.2p . we express T s , t and 5s according to Yy, Z^ and X# then we 
conduct standard Gaussian computations to obtain the expression of h(x,yij) 0. In 
Theorem 13. 1[ we give an explicit expression of T S) t[l](x) and h{x,yij) in the case of 
multi-dimensional B&S models with independent coordinates. 

We can see that now that we know the explicit value of T S}t [1] (x) and h(x, {yij}i<j<i<d), 
subsequently, we should choose between the simulation of: 



3 One can use Mathematica to compute it formally. 
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PI) N paths of g(St)h (x,{Jq aij(u)dWl}i t j^ then set the continuation to the 

i Eli 9\S t )h (x, {J* a l3 {u)dWi}[ t 



value 



C{x) 



l<j<i<d 



r.,*[i](s 



P2) N' paths of g(S t )h (x, {J* a i:j (u)dWl}i d ^ and N paths of h (x, {J^ a l3 {u)dWl}^ 
then set the continuation to the value 



C(x) :-- 



W E£x 9\S t )h (x, {Si a tJ (u)dWl}[ 



<j<i<d 



<j<i<d 



Based on a variance reduction argument, Theorem 14.31 will indicate the preferable 
method to use. 

Theorem 3.1. For any s G]0,t[, g € Eh and x > 0, if o~ij(t) = <7i 3 8(i — j) then 
the conditional expectation given in Theorem \2.5\ can be reduced to 



E 



(g(s t ) 



s s 



e (g(S t ) uLi ^ ex P + w t 



(d 2 k(W t K )+m k ) 
2 



with 



= °iM ^ 7 — , d 2k (W t k ) = 



s(t - s) 



sW t k 



dik = 



where 



Pk = — In 

Ok 



~x k ~ 


4) 


ok 

L^o - 





Proof. We simplify the constant o>s(t — s) in (|2.17p from the denominator and 
the numerator of the conditional expectation E (^g(St) S s — x^j , then we use the in- 
dependence of the coordinates to obtain 



E 



(g(s t ) 



S, = x) = 



Il C LiE(H k (S*)WyS*) 



Afterwards, we use the independence of the increments to obtain 

E (^f&W**) = E (m^[(t - s)(W k + a k s) - s(W t k 
= (t - s)E (^f^ s fe + a k s)\ - sE (^f 1 ) E(W t k 
= (t-s)E^^l(^G + a k s)) , 



W k )] 
W s k ) 



where the random variable G has a standard normal distribution. Moreover we have 
the following equality in distribution 
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Computing the expectation we obtain 

(3.3) E (^p-(^G + * k sfj = ak (t-s)^e-^, 

with a k — e CT * s . 

Regarding the numerator, we condition according to W k — w k and we use the 
independence of coordinates 

E \9(S t )j[H k (S*)WZ t /S* \ =E\g(S t )j[h k (W t k ) ] j , 

with 

(3-4) h k (w k ) = E (^H k (S k )W k t /S k \w k = w k ) . 

Knowing Wq = and W k — w k , when we fix s the random variable W k = ^§ — h 
'■^Y^-G and G has a standard normal distribution. Also, we have the follow- 
ing equality in distribution for W k t : W k t = (t — s)a k s + y/ts(t — s)G and S k = 

S% exp ^~4 S + CT fe^f- + ^^^j^G^j . Then we compute (JOD which yields: 

to K s u t k\ ■ ts(t-s) (sa k fsa k k \ (d 2k (w k ) + m k ) 2 \ 
(3.5) /lfe (^)= afc y__ eX p^- r (— +w ) j, 

2 

with a k = e CTfcS . 

Using (|3.3|) and (|3 . 5[) we obtain the requested result. □ 



4. Advanced variance reduction method. We present, in this section, a less 
intuitive idea of variance reduction that is based on an appropriate relation between 
N and N' in (|0.6|) . This method can be applied independently from conditioning 
detailed in previous section. 

Lemma 4.1. Let (X k ) k £Pi* be a sequence of independent R™ -valued random vari- 
ables that have the same law. We suppose that X k is square integrable and we denote 
/i = E(X k ), C h3 = Cov(X h Xj). Let r > 0, V M = {x G R", ||x - ixlk™ < r} and 
g : R n — > R such that g 6 C l {V^), then we have the following limits when N — > oo 



g(X N ) — >g(ji) a.s., VN(g{X N ) - gfr)) — ► Af(0, S) inlaw, 

such that 



Proof. The almost sure convergence of g{X^) results from the law of the large 
numbers and from the continuity of g in /i. For the same reasons, the gradient vector 
fgCXw) converges a.s.Jo ff (/x). Besides \/~N(g(X n) - g{n)) = §&(Xn) ■ Vn(X n - 
H) + y/N(X N - n) ■ e(X N - n) and using the Slutsky Theorem, with G ~ Af(0, C) 
. (|f (X N ), VN(X N - //)) converges in law to (§§ (/x), G). 
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• (e(Xjv — m); vN(Xn — m)) converges in law to (0, G). 
Finally, because [x, y) i— > xy and (x,y) t-¥ x + y are continuous, then \^N(g(X n) 
g(fJ,)) converges in law to |^(/i)G. □ 

Let us denote Q as the quotient given by 



1 sr^N' v 

T 7 l^i=\ A 



(4.2) Q = ^ 



TV 2-ii=l *i 

If |J5(ii)| > e > 0, according to Lemma |4~T1 converges to E(Xi)/E(Yi). In the 
following two theorems we will prove that we can accelerate the speed of convergence 
when acting on the relation between N and N'. We analyze the two cases: 
case 1: N' = XiN with Ai € [1/N, 1] and we normalize Q4.20 



Q 



TV, TV' 



1 V JV' y. TV-TV' ^N-N' v \ \ X B N > + (1 - Ai)B 

TV ^JV' Z^i=l 2 « "T" N-N' Zjj=1 V 

where 

^ iV' tv' - N—N' 

2—1 2—1 ?'— 1 

We set <?i(a;, y, z) = x/(Xiy + (1 — Ai)z) and (|4.ip provides 

1 / A 2 9\ A 

(4.3) EifAx) = ^ (( 2A ? - 2A i + + ~^-^2P 

with A = E{X), B = E(Y), u\ = Var(X), aj = Var{Y) and p = Cov(X,Y)/{a 1 a 2 ). 
case 2: N = \ 2 N' with A 2 G [I/AT', 1] and we normalize flU) 



J- f H S^ N Y -J- N'-N sr^N'-N y 
N> \N i-ii=\ ^ l ~r TV'-jV ^> 



^ > - .. .. - , _ \ 2 An + (1 — X 2 )An'.n 

TV Zji=l *« JV 

where 

TV ^ TV'-TV ^ TV 

j4/v = 7 AT, , A AT' AT — 7 A^/, Bat — 7 Y; . 

2—1 2—1 2—1 

We set g 2 (x,y,z) = (\ 2 x + (1 — \ 2 )y)/z and (|4.ip provides 

i / A 2 9\ A 

(4.4) £ 2 (A 2 ) = ^ f (2A| - 2A 2 + l)<r? + - -J-a^p 



with A = £(X), S = E(Y), erf = Var(Ar~), erf = 7 ar (y) and p = Cou(X,y)/(cr 1 cr 2 ). 
Theorem 4.2. Based on what we defined above: 
1. If A 2 o~\ > B 2 o~\ the minimum variance E mln = Ei(A" MTl ), mt/i 

Ar" = ^ + f^, Ex given in (O. 



1G 



2. If A a\ < B a\ the minimum variance S mjIl = S 2 (A™ 4 ™), with 
X 2 m = l + ^, ^2 given in g^. 

Proof. We almost proved this theorem, indeed, one can easily verify that A™" is 
the minimum of Si(Ai) and A™" is the minimum of £2^2). To conclude we verify 
that Si (A) < £ 2 (A) if and only if A 2 a\ > B 2 a\. □ 

What is really appealing, in this theorem, is the fact that even if p = 0, one 
should use N = (1/2)N' or N' = (1/2) N depending on whether A 2 a\ > B 2 a\ or not. 
Nevertheless, in order to apply the results of either this theorem or Theorem 14.31 we 
should have a "sufficiently good" approximation of a\, 02, A, B and p. With our 
model B = T s j[l](x) is explicitly known and we can have a% in the same fashion as 
T Sit [l](a;). In section [SJ procedure P2 is implemented by using the closed expression 
of B and o~2 and simulating ax, A, p to get an approximation of A™ m or A™™ that we 
use to re-simulate Q. In the case where B and cr 2 are not known, we can implement 
one of the two methods that are also efficient: 

Ml) Using all the simulated paths N max , we approximate the values of a±, cr 2 , A, 

B and p then we compute A™ m or A™" that we use to re-simulate Q. 
M2) A fixed point alike method: Using all the simulated paths N max , we approx- 
imate the values of a%, a% , A, B and p then fix a threshold e and test the 
condition A 2 a 2 > B 2 a\: 

If A 2 a 2 > B 2 a\: Use the previous approximations except A that will be 
simulated using AiiV max paths, such that A™ m is reached when 



1 Ba lP 



2 2Aa 2 



< e. 



If A 2 a 2 < B 2 a\: Use the previous approximations except B that will be 
simulated using A 2 -/V maa; paths, such that A™" is reached when 



1 Aa 2 p 



2 2Bai 



< e. 



In the following theorem, we will answer on whether we should implement the simu- 
lation procedure PI or P2. 

Theorem 4.3. Based on what was defined above and on the values of A™ m and 
A™ 1 ™ given in Theorem \4-2\ if 

1. A 2 al > B 2 a\ and 1 > p > |^ (^|^) then (B 2 Ei(A7 m ) - °i) < 0. 

2. A 2 a 2 < B 2 a\ and 1 > p > %l (J* + _ |) then (i? 2 E 2 (Ar") - of) < 0. 

Proof. 

1. If A 2 a\ > B 2 a\: Ei(A™ in ) = Ei(p) then we look for the condition on p that 
allows that the trinomial (S 2 Ei(p) — af) is negative. 

2. We go through the same argument as in 1. 

□ 

Theorem 14.31 tells us that, even though we can compute explicitly the expression of 
T Sj t[l](a;), according to the correlation, one can accelerate the convergence when using 
the quotient of two Monte Carlo estimators. 
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5. Simulation and numerical results. In this section we test our simulations 
on a geometric average payoff that has the following payoff 



In addition, we will test the American put on minimum and the American call on 
maximum that have the following payoffs 



(5.2) ^ min (S T ) = (K-min{S^,S^)) $ m ax{S T ) = (max(4, S£) - K) 



The parameters of the simulations are the following: The strike K = 100, the maturity 
T = 1, the risk neutral interest rate r — ln(l.l), the time discretization is defined 
using the time steps that is given as a parameter in each simulation, Sq = 100 and 
a ij(t) = a ij(t)d{j ~ i) with an — 0.2. The model considered is a multidimensional 
log-normal model 



All the prices and the standard deviations are computed using a sample of 16 simu- 
lations. Besides, the true values, to which we compare our simulation results, are set 
using: 

• the one-dimensional equivalence and a tree method [10], available in Premia 
POP, for options with Q^^St) as payoff, 

• the Premia implementation of a finite difference algorithm [12] in two dimen- 
sions for $ mm (S T ) and $ m ax(ST)- 

In Figure 15.11 we compare the P2 (N ^ N') version of MCM with a standard 
LS algorithm. The LS is implemented using linear regression for multidimensional 
contracts and using up to three degree monomials for the one-dimensional contract. 
The reason behind the choice of linear regression in the multidimensional case is the 
fact that the regression phase of LS can really increase the execution time without a 
significant amelioration of the prices tested. 

In Figure 15. 1[ even if all the prices are sufficiently good, we see that MCM pro- 
vides better prices than those of LS. Also when we increase the time steps, MCM 
is more stable than LS. However, for n — 10 and time steps > 10, we remark that 
one should simulate 2 14 trajectories to stablize MCM. This fact is expected due to 
the important variance of the ten dimension contract and that one should simulate 
more trajectories, on the one hand, to have an asymptotically good approximation 
of the relation between N and N' and, on the other, to have a sufficient number of 
trajectories for the approximation of the continuation. The executions of MCM and 
LS with 2 10 trajectories are carried out in less than one second. Moreover, using 2 14 
trajectories the LS and MCM are executed within seconds (< 5s). As a conclusion 
from this figure, MCM provides better results than LS in approximately the same 
execution time. When we increase the simulated trajectories to 2 14 , the MCM prices 
are stabilized for high dimensions and are always better than LS prices. 



(5.1) 




= rdt + {aij'dWu S i0 = yi, i 



= 1 



,..,d. 



't 
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1 Dimension and 2 10 Trajectories 1 Dimension and 2 14 Trajectories 
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Q- O 
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Fig. 5.1. MCM Vs. LS for &g eo (ST): PR is the real price. PM and PL are the prices obtained 
respectively by MCM and LS represented with their standard deviations. 



In Table [531 we remain with the same payoff &g eo {ST) but this time we compare 
the different nonparametric methods of implementing MCM. In P2(=) and P2(Opt), 
we use the same P2 method but with N — N' for the first one and N ^ N' for the 
second (The relation between N and N' is detailed in pages 16 and 17). First, we 
remark that P2(=) is not stable in the multidimensional case and can give wrong 
results if the time steps > 10. However the P2 method is stabilized when we im- 
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plement the version N 7^ N' of the advanced variance reduction method detailed in 
section SJ Also when we use 2 10 trajectories, PI and P2(Opt) are almost similar. 
Nevertheless, with 2 14 trajectories, P2(Opt) outperforms PI which indicates that we 
fill the conditions of Theorem 14.31 and we have an asymptotically good approximation 
of the relation between N and N'. As far as the execution time is concerned, the time 
consumed by P2(Opt) is not much different from PI when we use 2 10 trajectories. 
In addition, using 2 14 trajectories, the computations of the relation between N and 
N' can be performed on the CPU when the rest of the simulation is done on the GPU. 
The latter fact allows a similar overall execution time for P2(Opt) and PI (within 
seconds). 

Table 5.1 

PI Vs. P2 for ^g eo (ST): The real values are equal to 4.918, 1.583 and 0.890 for dimensions 
one, five and ten respectively 



Simulated 


Dim 


Time 




Price 






Std Deviation 


Paths 


n 


Steps 


PI 


P2(=) 


P2(Opt) 


PI 


P2(=) 


P2(Opt) 


2 U) 


1 


10 


4.750 


4.826 


4.789 


0.213 


0.167 


0.160 


2 io 


1 


20 


4.729 


4.880 


4.800 


0.270 


0.226 


0.216 


2 io 


1 


30 


4.679 


4.909 


4.853 


0.270 


0.179 


0.190 


2 1U 


5 


10 


1.548 


1.681 


1.526 


0.071 


0.073 


0.067 


2 io 


5 


20 


1.632 


> 2.0 


1.588 


0.070 




0.048 


2 io 


5 


30 


1.650 


> 2.3 


1.619 


0.074 




0.069 


2 iu 


10 


10 


0.900 


1.112 


0.869 


0.039 


0.045 


0.044 


2 io 


10 


20 


0.921 


> 1.3 


0.936 


0.043 




0.047 


2 io 


10 


30 


0.908 


> 1.5 


0.949 


0.035 




0.046 


2 14 


1 


10 


4.738 


4.812 


4.807 


0.057 


0.046 


0.047 


2 14 


1 


20 


4.675 


4.869 


4.825 


0.047 


0.044 


0.043 


2 14 


1 


30 


4.638 


4.876 


4.856 


0.072 


0.059 


0.058 


2 14 


5 


10 


1.487 


1.526 


1.506 


0.057 


0.012 


0.012 


2 14 


5 


20 


1.504 


1.639 


1.534 


0.047 


0.021 


0.016 


2 14 


5 


30 


1.508 


> 1.8 


1.543 


0.072 




0.015 


2 14 


10 


10 


0.845 


0.938 


0.842 


0.013 


0.015 


0.012 


2 14 


10 


20 


0.901 


> 1.2 


0.893 


0.012 




0.014 


2 14 


10 


30 


0.923 


> 1.3 


0.916 


0.015 




0.016 



Because of the bad results obtained previously with P2(=), we eliminate this 
method and we only consider P2(Opt) and PI. In Table IST^l we analyze the Ameri- 
can put on minimum and the American call on maximum in two dimensions. As far as 
$ m i„ is concerned, P2(Opt) outperforms PI even when we use only 2 10 . Regarding 
&max, PI performs better than P2(Opt) for 2 10 trajectories which indicates that, 
because of the big variance produced by <i> m ax relatively to $ mm , the relation between 
N and N' is not well estimated. Simulating 2 14 trajectories, we obtain similar results 
for PI and P2(Opt) for $> max . 

Let us now study the parallel adaptability of MCM for parallel architectures. 
In Figure 15. 2\ we present the speedup of parallelizing MCM on the four cores of 
the CPU instead of implementing it on only one core. We notice that the speedup 
increases quickly according to the number of the simulated trajectories and it reaches 



4 We use OpenMP directives. 
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Table 5.2 

PI Vs. P2 for <S> 

min toTid &max-' The real values are equal to 8.262 and 21.15 respectively 



Simulated 


The 


Time 




Price 


Std Deviation 


Paths 


Payoff 


Steps 


PI 


P2(Opt) 


PI 


P2(Opt) 


2 10 




10 


7.734 


7.986 


0.190 


0.248 


2 io 


^ rain 


20 


7.618 


7.895 


0.257 


0.270 


2 io 




30 


7.564 


7.920 


0.224 


0.263 


2 10 




10 


21.03 


20.33 


0.66 


0.86 


2 io 




20 


20.46 


19.38 


0.61 


0.73 


2 io 


*&max 


30 


19.73 


18.13 


0.73 


0.93 


2 14 


^min 


10 


7.755 


8.088 


0.058 


0.067 


2 14 




20 


7.584 


8.098 


0.098 


0.052 


2 14 


^rain 


30 


7.467 


8.087 


0.082 


0.043 


2 14 


^raax 


10 


20.96 


20.91 


0.09 


0.24 


2 14 


*&max 


20 


20.58 


20.56 


0.16 


0.16 


2 14 


*&max 


30 


20.36 


20.05 


0.15 


0.22 



4.5 




2 1 , , , , , , , , 1 

2000 4000 6000 8000 10000 12000 14000 16000 18000 
Number of Simulated Paths 

Fig. 5.2. The speedup of using all the CPU cores according to the number of trajectories. 

a saturation state for > 9000 trajectories. For a large dimensional problem, the 
maximum speedup obtained is approximately equal to the number of logical cores on 
the CPU which indicates that MCM is very appropriate for parallel architectures. We 
point out, however, that our parallelization of MCM is done on the trajectorie^l, so 
the speedup is invariable according to dimensions and time steps. 

Regarding GPU implementation, we also use a path parallelization of simulations. 
In Figure 15.31 we present the speedup of parallelizing MCM on the GPU instead of 
implementing it on the four cores of the CPU. The speedup increases quickly not only 
according to the number of simulated trajectories, but also according to the dimension 
of the contract. The latter fact can be easily explained by the memory hierarchy of the 
GPU. The speedups provided in Figure [5731 prove, once again, the high adaptability 
of MCM on parallel architectures. 

5 which is the most natural procedure of parallelizing Monte Carlo. 
6 We use CUDA language. 
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Fig. 5.3. The speedup of using the GPU instead of the CPU cores according to the number of 
trajectories. 



6. Conclusion. In this article we provided, on the one hand, theoretical results 
that deal with the continuation computations using the Malliavin calculus and how one 
can reduce the Monte Carlo variance when simulating this continuation. On the other 
hand, we presented numerical results related to the accuracy of the prices obtained 
and the parallel adaptability of the MCM method on multi-core architectures. 

As far as the theoretical results are concerned, based on the Malliavin calculus, 
we provided a generalization of the value of the continuation for the multi-dimensional 
models with deterministic and non a constant triangular matrix u(t). Moreover, we 
pointed out that one can effectively reduce the variance by a simple conditioning 
method. Finally, we presented a less intuitive but very effective variance reduction 
method based on an appropriate choice of the number of trajectories used to approx- 
imate the quotient of two expectations. 

Regarding the numerical part, we proved that the one who looks for instantaneous 
simulations can obtain better and sufficiently good prices with MCM than with LS 
using only 2 10 trajectories. Also, unlike LS, our nonparametric variance reduction 
implementation of MCM does not require parametric regression. Thus we improve 
the results of the simulation by only increasing the number of trajectories. Finally, 
increasing the number of trajectories is time consuming but MCM can be effectively 
parallelized on multi-core CPUs and GPUs. Indeed, the MCM simulation of 2 14 
trajectories using the GTX 480 GPU can be performed within seconds (< 5s). 

Appendix. Proof of Lemma 12.21 The equality (|2.4ft can be easily proved. 
Indeed, using the chain rule 

n 

D k J(S t )=J2^(u)Sfd Xp f(S t ) 

p—k 

Besides, we assumed that p(u) = ct _1 (m) which completes the proof. 

□ 
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Proof of Lemma 12.31 Using duality (|1.2p we have 

E (h(S*)Fj:t k Jj Pik{u)dWi) = E fctk Ii D u [KS k s )F] Plk (u)du) 
= E {h{S k s ) £IL fc J, DiFp ik {u)du) + E (F J2i=k Jz h'{S k s )a kl {u) Plk {u)S k s du) 

Moreover, the fact that cr(u) and p(u) are two triangular matrices such that p kk (u) — 
l/a kk (u) simplifies the last term which can be also rewritten using the Malliavin 
derivative 

E [f jh'{S k s )S k s du^ = E (V 
This provides the required result. 

□ 
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